Goto

Collaborating Authors

 sharper bound


Stability and Generalization of Asynchronous SGD: Sharper Bounds Beyond Lipschitz and Smoothness

Neural Information Processing Systems

Asynchronous stochastic gradient descent (ASGD) has evolved into an indispensable optimization algorithm for training modern large-scale distributed machine learning tasks. Therefore, it is imperative to explore the generalization performance of the ASGD algorithm. However, the existing results are either pessimistic and vacuous or restricted by strict assumptions that fail to reveal the intrinsic impact of asynchronous training on generalization. In this study, we establish sharper stability and generalization bounds for ASGD under much weaker assumptions. Firstly, this paper studies the on-average model stability of ASGD and provides a non-vacuous upper bound on the generalization error, without relying on the Lipschitz assumption.


Sharper Concentration Inequalities for Multi-Graph Dependent Variables

Shao, Xiao, Wu, Guoqiang

arXiv.org Machine Learning

In multi-task learning (MTL) with each task involving graph-dependent data, generalization results of existing theoretical analyses yield a sub-optimal risk bound of $O(\frac{1}{\sqrt{n}})$, where $n$ is the number of training samples.This is attributed to the lack of a foundational sharper concentration inequality for multi-graph dependent random variables. To fill this gap, this paper proposes a new corresponding Bennett inequality, enabling the derivation of a sharper risk bound of $O(\frac{\log n}{n})$. Specifically, building on the proposed Bennett inequality, we propose a new corresponding Talagrand inequality for the empirical process and further develop an analytical framework of the local Rademacher complexity to enhance theoretical generalization analyses in MTL with multi-graph dependent data. Finally, we apply the theoretical advancements to applications such as Macro-AUC Optimization, demonstrating the superiority of our theoretical results over previous work, which is also corroborated by experimental results.


Sharper Bounds for Proximal Gradient Algorithms with Errors

Hamadouche, Anis, Wu, Yun, Wallace, Andrew M., Mota, Joao F. C.

arXiv.org Machine Learning

We analyse the convergence of the proximal gradient algorithm for convex composite problems in the presence of gradient and proximal computational inaccuracies. We derive new tighter deterministic and probabilistic bounds that we use to verify a simulated (MPC) and a synthetic (LASSO) optimization problems solved on a reduced-precision machine in combination with an inaccurate proximal operator. We also show how the probabilistic bounds are more robust for algorithm verification and more accurate for application performance guarantees. Under some statistical assumptions, we also prove that some cumulative error terms follow a martingale property. And conforming to observations, e.g., in \cite{schmidt2011convergence}, we also show how the acceleration of the algorithm amplifies the gradient and proximal computational errors.